URL
Use the URL connector to import content directly from web pages. The connector crawls and ingests publicly-accessible web content and stores it in the Knowledge Repository for retrieval-augmented generation (RAG) queries.
When to use
- Quick ingestion of public documentation, blogs, or knowledge bases
- Periodic scraping of frequently-updated pages
Usage
- Just provide the URL in form
https://yoururl.com. It will scrape the page content if images are there then that will also scraped. - For images it will use OCR to retrieve data
Notes
- Ensure the target URLs allow crawling and scraping.